Fit Gamma factor

This notebook estimates the gamma factor from a set of 5 μs-ALEX smFRET measurements.

What this notebook does?

According to Lee 2005 (PDF, SI PDF), we estimate the $\gamma$-factor from Proximity Ratio (PR) and S values (with background, leakage and direct excitation correction) for a set of 5 μs-ALEX measurements.

The PR and S values are computed by the template notebook:

usALEX-5samples-PR-leakage-dir-ex-all-ph

which is executed for each sample using usALEX-Batch.

From Lee 2005 (equation 20), the following linear relation holds:

$$\frac{1}{S} = \Omega + \Sigma \cdot E_{PR}$$

Once $\Omega$ and $\Sigma$ are fitted, we can compute the $\gamma$-factor as (equation 22):

$$\gamma = (\Omega-1)/(\Omega + \Sigma-1)$$

Import libraries



In [3]:

    
from __future__ import division
import numpy as np
import pandas as pd
import lmfit
from scipy.stats import linregress

Computation

This notebook read data from the file:



In [6]:

    
data_file = 'results/usALEX-5samples-PR-leakage-dir-ex-all-ph.txt'



In [7]:

    
data = pd.read_csv(data_file, sep="\s+").set_index('sample')
data









    Out[7]:






  
    
      
      n_bursts_all
      n_bursts_do
      n_bursts_fret
      E_pr_fret
      E_pr_fret_sig
      S_pr_fret
      S_pr_fret_sig
      E_pr_do_kde
      nt_mean
      E_pr_fret_kde
    
    
      sample
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      7d
       1169
       582
        544
       0.926413
       0.002498
       0.553918
       0.004235
       0.0014
       22.265796
       0.9302
    
    
      12d
       1304
       329
        945
       0.731053
       0.003017
       0.565995
       0.003207
       0.0164
       21.905752
       0.7432
    
    
      17d
       2483
       462
       1958
       0.427139
       0.002573
       0.543997
       0.002362
       0.0126
       21.094356
       0.4318
    
    
      22d
       2051
       319
       1670
       0.183379
       0.001969
       0.549327
       0.002511
       0.0008
       22.987313
       0.1796
    
    
      27d
        787
       160
        586
       0.083907
       0.002890
       0.562717
       0.004169
      -0.0068
       16.941794
       0.0844



In [8]:

    
data[['E_pr_fret', 'E_pr_fret_kde']]









    Out[8]:






  
    
      
      E_pr_fret
      E_pr_fret_kde
    
    
      sample
      
      
    
  
  
    
      7d
       0.926413
       0.9302
    
    
      12d
       0.731053
       0.7432
    
    
      17d
       0.427139
       0.4318
    
    
      22d
       0.183379
       0.1796
    
    
      27d
       0.083907
       0.0844



In [9]:

    
res = linregress(data.E_pr_fret_kde, 1/data.S_pr_fret)
slope, intercept, r_val, p_val, stderr = res

For more info see scipy.stats.linearregress.



In [10]:

    
Sigma = slope 
Sigma









    Out[10]:





-0.009993289402355424



In [11]:

    
Omega = intercept
Omega









    Out[11]:





1.8063091259676307

Pearson correlation coefficient:



In [10]:

    
r_val









    Out[10]:





0.030997465223121211

Coefficient of determination $R^2$:



In [11]:

    
r_val**2









    Out[11]:





0.00096084285025860891

P-value (to test the null hypothesis that the slope is zero):



In [12]:

    
p_val









    Out[12]:





0.96053912269852137

Gamma computed from the previous fitted values:



In [12]:

    
gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma









    Out[12]:





1.012549404323615



In [13]:

    
with open('results/usALEX - gamma factor - all-ph.txt', 'w') as f:
    f.write(str(gamma))

Corrected FRET values

With the knoledge of $\gamma$ we can compute the corrected FRET efficiencies.

Let define the function to apply gamma correction (from burstlib.py):



In [14]:

    
def gamma_correct_E(Er, gamma):
    """Apply gamma correction to the uncorrected FRET `Er`."""
    Er = np.asarray(Er)
    return Er/(gamma-gamma*Er+Er)

The corrected values are:



In [15]:

    
E_alex = gamma_correct_E(data.E_pr_fret, gamma)
E_alex









    Out[15]:





array([ 0.92456139,  0.73109077,  0.42336969,  0.18263484,  0.07914672])



In [16]:

    
data.E_pr_fret









    Out[16]:





sample
7d        0.924344
12d       0.730478
17d       0.422609
22d       0.182170
27d       0.078920
Name: E_pr_fret, dtype: float64

Non-linear least squares, using Lmfit

With a non-linear least-squares minimization we are not limited to fit a linear relation.

Let's rewrite the relation without inverting $S_R$:

$$S_R = \frac{1}{\Omega + \Sigma \cdot E_{PR}}$$



In [17]:

    
import lmfit
from lmfit import minimize, Parameters, report_fit, report_ci, report_errors



In [18]:

    
def residual(params, Er, Sr, eps_data=None):
    slope = params['slope'].value
    intercept = params['intercept'].value

    model = 1./(intercept + slope*Er)
    if eps_data is None:
        eps_data = np.ones_like(Er)
    return (Sr - model)/eps_data

params = Parameters()
params.add('intercept', value=2)
params.add('slope', value=0.)

out = minimize(residual, params, args=(data.E_pr_fret, data.S_pr_fret))



In [19]:

    
out.values









    Out[19]:





{'intercept': 1.7406596630381004, 'slope': 0.0030796355446684408}



In [20]:

    
report_fit(params)









    



[[Variables]]
     intercept:     1.74066 +/- 0.02472581 (1.42%) initial =  2
     slope:         0.003079636 +/- 0.0436766 (1418.24%) initial =  0
[[Correlations]] (unreported correlations are <  0.100)



In [21]:

    
Omega = out.values['intercept']
Sigma = out.values['slope']

gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma









    Out[21]:





0.99585925397442776

We can compute the confidence intervals (method details) and derive the effect on $\gamma$:



In [22]:

    
ci = lmfit.conf_interval(out)
report_ci(ci)









    



/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=-0.197920364455) = 0.980791305555 < max(sigmas).
  warn(errmsg)
/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=0.204079635545) = 0.98002107331 < max(sigmas).
  warn(errmsg)






    



             99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
    slope      -inf  -0.13579  -0.04813   0.03940   0.05448   0.14337       inf
intercept   1.53797   1.66423   1.71197   1.74066   1.77002   1.82203   1.98211

Let's compute gamma in the 95% confidence interval for the 2 parameters using a simple brute-force method:



In [23]:

    
gamma_list = []
for sig_err in np.linspace(ci['slope'][1][1], ci['slope'][5][1], 20):
    for omega_err in  np.linspace(ci['intercept'][1][1], ci['intercept'][5][1], 20):
        Omega = out.values['intercept'] + omega_err
        Sigma = out.values['slope'] + sig_err

        gamma = (Omega - 1)/(Omega + Sigma - 1)
        gamma_list.append(gamma)



In [24]:

    
np.min(gamma_list), np.max(gamma_list)









    Out[24]:





(0.94259831075324385, 1.0584055044039404)

NOTE: Using the 95% confidence interval for the 2 fit parameters we observe a variation on gamma ~ 12%. This only considering the empirical error of the 5 data points respect to the model. Moreover, the underlying assumption is that the errors are equally distributed for all the data points. So a more realistic estimation could be worst than that.

Just for test we can fit the same linear relation we fitted initially with linregress:



In [25]:

    
def residual2(params, x, data, eps_data=None):
    slope = params['slope'].value
    intercept = params['intercept'].value

    model = (intercept + slope*x)
    if eps_data is None:
        eps_data = np.ones_like(data)
    return (1./data-model)/eps_data

params2 = Parameters()
params2.add('intercept', value=2)
params2.add('slope', value=0.)

out2 = minimize(residual2, params2, args=(data.E_pr_fret, data.S_pr_fret))



In [26]:

    
ci = lmfit.conf_interval(out2)
report_ci(ci)









    



/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=-0.198576396493) = 0.9808242408 < max(sigmas).
  warn(errmsg)
/Users/anto/anaconda/lib/python2.7/site-packages/lmfit/confidence.py:289: UserWarning: Warning, maxiter=200 reachedand prob(slope=0.203423603507) = 0.980824277424 < max(sigmas).
  warn(errmsg)






    



             99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
    slope      -inf  -0.13627  -0.04863   0.03839   0.05348   0.14112       inf
intercept   1.52166   1.66272   1.71237   1.74130   1.77024   1.81989   1.96096



In [27]:

    
Omega = out2.values['intercept']
Sigma = out2.values['slope']

gamma = (Omega - 1)/(Omega + Sigma - 1)
gamma









    Out[27]:





0.9967412671491358

Alternative method

By definition the stoichiometry is independent from E:

$$ S = \frac{\gamma F_D + F_A}{\gamma F_D + F_A + F_{AA}}$$

Neglecting $\gamma$ we obtain a raw stoichiometry (analogous of the proxhimity ratio):

$$ S_R = \frac{F_D + F_A}{F_D + F_A + F_{AA}}$$

Changes of $S_R$ as a function of $E$ indicates $\gamma \ne 1$.

We can fit $\gamma$ by minimizing the variance of $S$. However, experimentally, we have $E_R$ and $S_R$ and not $F_D$, $F_A$ and $F_{AA}$. We can express $S_R$ as a function of $E_R$:

$$ S_R = \frac{1}{1 + \frac{F_{AA}}{F_D + F_A}} = \frac{1}{1 + E_R\frac{F_{AA}}{F_A}} = \frac{1}{1 + E_R k_2}$$

where $k_2 = F_{AA}/F_A$. $k_2$ as a function of $E_R$ and $S_R$ is:

$$ k_2 = \left(\frac{1}{S_R} - 1\right)/E_R$$

Using this last definition, we can write $S$ as:

$$S = \frac{1}{1 + E k_2} = \frac{1}{1 + g(E_R, \gamma) k_2} = f(\gamma, E_R, S_R)$$

where $g(\cdot)$ function that links $E$ and $E_R$. To find $\gamma$ we can minimize the variance of $S$ (i.e. $f(\cdot)$).

Let start defining $g(\cdot)$ and $k_2(\cdot)$:



In [28]:

    
def correct_E(Er, gamma):
    Er = np.asanyarray(Er)
    return Er/(gamma-gamma*Er+Er)

def k2(Er, Sr):
    return (1./Sr - 1)/Er

A direct brute-force search gives:



In [29]:

    
gamma_range = np.arange(0.9, 1.1, 0.00002)
y_std = np.zeros_like(gamma_range)
for i, gamma in enumerate(gamma_range):
    y = 1/(1 + correct_E(data.E_pr_fret, gamma)*k2(data.E_pr_fret, data.S_pr_fret))
    y_std[i] = y.std()
    #plot(y)
gamma_range[y_std == y_std.min()]









    Out[29]:





array([ 0.99696])

Expressing the fit as a minimization problem gives exactly the same result:



In [30]:

    
def residual(params, Er, Sr):
    gamma = params['gamma'].value
    S = 1/(1+correct_E(Er, gamma)*k2(Er, Sr))
    errors = S - S.mean()
    return errors

params = Parameters()
params.add('gamma', value=1.0)

out = minimize(residual, params, args=(data.E_pr_fret, data.S_pr_fret))



In [31]:

    
out.values['gamma']









    Out[31]:





0.99696873118951224



In [32]:

    
report_fit(params)









    



[[Variables]]
     gamma:     0.9969687 +/- 0.05088592 (5.10%) initial =  1
[[Correlations]] (unreported correlations are <  0.100)

Here we obtained that the standard deviation of the fitted $\gamma$ is around 5%, consistent with the previous estimations.

Fit plot



In [33]:

    
%matplotlib inline
from matplotlib.pyplot import *



In [34]:

    
plot(data.E_pr_fret, 1/data.S_pr_fret, 's')
x = np.arange(0, 1, 0.01)
plot(x, intercept + slope*x, 'k')
ylim(1, 2)
xlabel('E (uncorrected)')
ylabel('1/S (uncorrected)')









    Out[34]:





<matplotlib.text.Text at 0x108af1b50>



In [30]:



In [30]:



In [30]:

	n_bursts_all	n_bursts_do	n_bursts_fret	E_pr_fret	E_pr_fret_sig	S_pr_fret	S_pr_fret_sig	E_pr_do_kde	nt_mean	E_pr_fret_kde
sample
7d	1169	582	544	0.926413	0.002498	0.553918	0.004235	0.0014	22.265796	0.9302
12d	1304	329	945	0.731053	0.003017	0.565995	0.003207	0.0164	21.905752	0.7432
17d	2483	462	1958	0.427139	0.002573	0.543997	0.002362	0.0126	21.094356	0.4318
22d	2051	319	1670	0.183379	0.001969	0.549327	0.002511	0.0008	22.987313	0.1796
27d	787	160	586	0.083907	0.002890	0.562717	0.004169	-0.0068	16.941794	0.0844